Author: Wei Li, weililab.org
Note This R markdown file provides a simple and convenient web report for the MAGeCK results. For experienced users, MAGeCKFlute R package is recommended as it provides additional visualization functionalities.
comparison_name is the prefix of your output file, defined by the “-n” parameter in your “mageck test” command. The system will look for the following files to generate this report:
# define the comparison_name here; for example,
# comparison_name='demo'
comparison_name='D34_D4_w_controls'
FDR cutoff is used to draw a boundary line in RRA or p value plots. Set it to -1 to disable the boundary line.
fdrcutoff=0.05
# fdrcutoff=-1 # disable FDR cutoff line
Reading input files. If any of these files are problematic, an error message will be shown below.
gstable=read.table(gene_summary_file,header = T,as.is = T,na.strings='')
sg_table=read.table(sgrna_summary_file,header = T,as.is = T,na.strings='')
comp_samples=getcomparisonsfromlogfile(log_file)
collabel=c(comp_samples[[1]],comp_samples[[2]])
The samples used in the comparison is indicated as follows.
| Sample | Type |
|---|---|
| D4-REP1 | control |
| D4-REP2 | control |
| D4-REP3 | control |
| D34-REP1 | treatment |
| D34-REP2 | treatment |
| D34-REP3 | treatment |
The statistics of comparisons is as indicated in the following table.
| Comparison | Genes | Selection | FDR1% | FDR5% | FDR25% |
|---|---|---|---|---|---|
| D34_D4_w_controls | 1210 | neg | 0 | 0 | 0 |
| D34_D4_w_controls | 1210 | pos | 194 | 233 | 233 |
The meanings of the columns are as follows.
The following figures show:
The following genes are used in the plot (change it as you like). By default, it is the top 5 genes in negatively selected genes.
targetgenelist_neg=gstable[gstable[,6]<=5,1]
# or, directly specify the genes to be plotted
#targetgenelist_neg=c("ACTR8","ACIN1")
# display genes used in the plot
print(targetgenelist_neg)
## [1] "V173E" "V172D" "R174R" "V172A" "V173Ter"
The following figure plots the distribution of RRA scores across these genes. Dotted lines represent the FDR cutoff line defined by the “fdrcutoff” value in the “Paramters” section.
The following figure plots the distribution of p values in these genes. Dotted lines represent the FDR cutoff line defined by the “fdrcutoff” value in the “Paramters” section.
The following genes are used in the plot (change it as you like). By default, it is the top 5 genes in negatively selected genes.
targetgenelist_pos=gstable[gstable[,12]<5,1]
# or, directly specify the genes to be plotted
#targetgenelist_pos=c("ACTR8","ACIN1")
# display genes used in the plot
print(targetgenelist_pos)
## [1] "V197L" "D393N" "R196Q" "L22L"
The following figure plots the distribution of RRA scores across these genes. Dotted lines represent the FDR cutoff line defined by the “fdrcutoff” value in the “Paramters” section.
The following figure plots the distribution of p values in these genes. Dotted lines represent the FDR cutoff line defined by the “fdrcutoff” value in the “Paramters” section.
The following figures show the distribution of sgRNA read counts (normalized) of selected genes in selected samples.
for(target_gene in c(targetgenelist_neg,targetgenelist_pos)){
plotindvidualsgrnas(sg_table,target_gene,collabel)
}
The following figures show the distribution of sgRNA log2 fold changes of selected genes in selected samples.
for(target_gene in c(targetgenelist_neg,targetgenelist_pos)){
plotindvidualsgrnas_lfc(sg_table,target_gene,collabel)
}
The following table and figure uses Gene Set Enrichment Analysis (GSEA) to estimate the dropout of core essential genes (defined by the Johannes Zuber lab and also used in our MAGeCKFlute R package).
Note: fgsea R package is required.
library(fgsea)
#core_ess_gene=read.table('~/Dropbox/work/TengFei/design/mageck-flute/Core_Essentialome_Zuberlab_v2.txt',sep='\t',header = T,quote='',comment.char = '')
gene_id=gstable$id
if(sum(gene_id%in%core_ess_gene_symbol)<6){
print('Not enough gene symbols found in essential gene list. Skip the enrichment step.')
}else{
ranks=log10(gstable$neg.score)
ranks=nrow(gstable):1/nrow(gstable)
ranks=ranks*2-1
names(ranks)=gstable$id
gset=gstable$id[gene_id%in%core_ess_gene_symbol]
gsetlist=list(zuber=gset)
#fgseaRes <- fgsea(gsetlist, ranks, minSize=15, maxSize = 500, nperm=1000)
fgseaRes <- fgseaMultilevel(gsetlist, ranks, minSize=15, maxSize = 500,scoreType = 'std',eps = 0)
fgseaRes=fgseaRes[order(fgseaRes$pval),]
#print(head(fgseaRes))
print(kable(fgseaRes,caption='Enrichment results'))
plotEnrichment(gsetlist$zuber, ranks)
}
## [1] "Not enough gene symbols found in essential gene list. Skip the enrichment step."